Fluorescent proteins and genes encoding them

ABSTRACT

Fluorescent proteins comprising the following internal amino acid sequence 
     
       
         
               
               
             
                   
                 (SEQ ID NO: 47) 
               
                   
                 Gly Tyr Xaa Xaa Xaa Gln Tyr Leu Pro Xaa Pro 
               
                   
                 1               5                   10 
               
           
              
              
              
             
          
         
       
         
         
           
             wherein
           Xaa in position 3 is Ala or Gly,   Xaa in position 4 is Phe, His or Tyr,   Xaa in position 5 is His, Tyr or Asn, or   Xaa in position 10 is Phe or Tyr
 
are disclosed. Such proteins are e.g. isolated or recombinant fluorescent proteins from a Cephalochordata, such as  Branchiostoma floridae  or  Branchiostoma lanceolatum , or isolated mutants or recombinant proteins that have at least 80% amino acid sequence identity with the fluorescent proteins. Isolated and purified structural genes encoding such fluorescent proteins are also disclosed.

CROSS REFERENCE TO RELATED APPLICATIONS

This application is a divisional of U.S. patent application Ser. No. 12/308,048, filed Dec. 5, 2008, which is a 371 of International Application No. PCT/SE07/00551, filed Jun. 8, 2007, which claims benefit under 35 U.S.C. 119(e) of U.S. Provisional Patent Application Ser. No. 60/811,769 filed Jun. 8, 2006 and claims benefit to Swedish Application No. 0601261-1, filed Jun. 8, 2006, the entire contents which are incorporated herein by reference.

The present invention relates to a class of new fluorescent proteins and genes encoding them. The invention is particularly concerned with fluorescent proteins from Cephalochordata or amphioxus or lancelet and structural genes encoding such proteins.

BACKGROUND

Fluorescent proteins (FPs), in particular green fluorescent proteins, are commonly used fluorescent makers in molecular biology to monitor gene expression and protein localization in living organisms and in medical diagnostic applications.

Fluorescent proteins are found in a variety of marine organisms ranging from the jellyfish Aequorea victoria, to the Indo-Pacific coral Discosoma. Due to their genetically encoded fluorescence, fluorescent proteins have become most important marker molecules and tools in cell biology. Becoming spontaneously fluorescent without any requirements for cofactors, substrates or other gene products, FPs have revolutionized research in many areas of biology.

During recent years FPs have also gathered strong appreciation as powerful tools for the drug discovery process. As fluorescent probes, FPs enable both real-time and non-invasive reporting in living cells. This ability provides a basis for cell-based monitoring of FP-linked targets upon administration of external drugs. The impact of FPs has been revolutionary; FPs have not only facilitated visualization of intricate cellular architecture but they have also acted as markers of protein dynamics and behavior in cell biology. These applications have been translated to drug discovery where fluorescence proteins have been utilized in fluorescence and confocal imaging, HTS/HCS screening assays and for in vivo diagnostics. FPs cannot only be used in early stage target characterization but also in retrieving non-invasive ‘whole organism’ data and in evaluating lead compound toxicology. Limitations of most fluorescent proteins are generally associated with molecular brightness and/or stability. Moreover, many FPs have additional complications involving protein folding, chromophore maturation and self-association. Although FPs have vastly improved over the years, mainly by introducing mutations, they still exhibit major limitations.

There is an interest in obtaining new fluorescent proteins with different properties compared to known fluorescent proteins. For instance, there are no fluorescent proteins on the market that can be used in paraffin sections at room temperature for immunohistochemical purposes since they lose their fluorescence intensity under such conditions.

DESCRIPTION OF THE INVENTION

The present invention provides a class of new fluorescent proteins with different properties compared to known proteins, e.g. they can be used in paraffin sections at room temperature for immunohistochemical purposes since they retain their fluorescence intensity under such conditions.

One aspect of the invention is directed to an isolated and purified structural gene encoding a fluorescent protein from a Cephalochordata, or encoding a mutant or recombinant protein that has at least 80% amino acid sequence identity with the fluorescent protein, and comprising the internal amino acid sequence

(SEQ ID NO: 47) Gly Tyr Xaa Xaa Xaa Gln Tyr Leu Pro Xaa Pro 1               5                   10 wherein

-   -   Xaa in position 3 is Ala or Gly,     -   Xaa in position 4 is Phe, His or Tyr,     -   Xaa in position 5 is His, Tyr or Asn, or     -   Xaa in position 10 is Phe or Tyr.

The term “structural gene” means the protein coding nucleotide sequence of a gene or polynucleotide.

The internal sequence SEQ ID NO:47 is found in all hitherto analyzed proteins of the new class of fluorescent proteins expressed by Cephalochordata, but some amino acid substitution, extension and/or deletion in this sequence may be possible, especially in the positions where there are variations in the amino acids, i.e. where the amino acid is Xaa.

A mutant or recombinant protein that has at least 80% amino acid sequence identity with a fluorescent protein defined in this invention may be truncated and/or have amino acid substitutions, insertions and/or deletions and have any percentage of amino acid identity with regard to the fluorescent protein defined in this invention between 80% and 99.9%, such as at least 80%, 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, 99.5% or 99.9% identity.

The Cephalochordata is also named amphioxus or lancelet and is herein exemplified by the species a) Branchiostoma floridae and b) Branchiostoma lanceolatum.

In an embodiment of the invention the structural gene is selected from the group consisting of SEQ ID NOs: 1-23.

In another embodiment of the invention the structural gene is selected from the group consisting of SEQ ID NOs: 48-67.

In still another embodiment of the invention the structural gene is selected from the group consisting of SEQ ID NOs: 88-90.

The nucleotide sequences SEQ ID NOs: 88-90 are examples of isolated and purified structural genes encoding mutant proteins of the invention. The exemplified mutant sequences have the sequence of the wild-type Branchiostoma lanceolatum nucleotide sequence SEQ ID NO: 59 from position 7 including position 666, and some point mutations. The nucleotide A in positions 356 and 357 of the mutant sequences SEQ ID NO: 88-90, respectively, has been inserted instead of the nucleotide C in the wild-type sequence, and further the nucleotide C in position 524 of the mutant sequences SEQ ID NOs: 89 and 90, respectively, has been inserted instead of the nucleotide A in the wild-type sequence. In addition, the mutant sequence SEQ ID NO: 90 has in position 469 the nucleotide A instead of G in the wild-type sequence and in position 471 the nucleotide G instead of C in the wild-type sequence.

Another aspect of the invention is directed to a vector comprising a structural gene according to the invention, such as a structural gene selected from the group consisting of SEQ ID NOs: 1-23, SEQ ID NOs: 48-67 and SEQ ID NOs: 88-90. The vector may be any vector which can comprise a structural gene of the invention and necessary flanking regions with regulatory elements necessary for expression of the desired fluorescent protein of the invention or a fusion protein comprising such a protein according to the invention. The regulatory elements necessary for expression are e.g. a suitable operon or promoter that is natural or foreign to the host selected for expression of the protein. Suitable vectors useful in the present invention are e.g. plasmids, cosmids and virus expression vectors.

Yet another aspect of the invention is directed to a host cell comprising a vector according to the invention or comprising a transgene including a structural gene according to the invention. The transgene should be operably inserted into the genome of the host to express the desired fluorescent protein of the invention or a fusion protein comprising such a protein according to the invention. Suitable host cells are both prokaryotic cells such as Escherichia coli cells, and eukaryotic cells such as mammalian, insect, yeast, and plant cells.

A further aspect of the invention is directed to a fluorescent protein comprising the internal amino acid sequence

(SEQ ID NO: 47) Gly Tyr Xaa Xaa Xaa Gln Tyr Leu Pro Xaa Pro 1               5                   10

wherein

-   -   Xaa in position 3 is Ala or Gly,     -   Xaa in position 4 is Phe, His or Tyr,     -   Xaa in position 5 is His, Tyr or Asn, or     -   Xaa in position 10 is Phe or Tyr.

The internal sequence SEQ ID NO:47 may possibly have some amino acid substitution, extension and/or deletion in this sequence, especially in the positions where there are variations in the amino acids, i.e. where the amino acid is Xaa.

In an embodiment of this aspect of the invention the protein of the invention is an isolated or recombinant fluorescent protein from a Cephalochordata, such as from the species a) Branchiostoma floridae or b) Branchiostoma lanceolatum, or an isolated mutant or recombinant protein that has at least 80% amino acid sequence identity with the fluorescent protein, and has the internal amino acid sequence

(SEQ ID NO: 47) Gly Tyr Xaa Xaa Xaa Gln Tyr Leu Pro Xaa Pro 1               5                   10

wherein

-   -   Xaa in position 3 is Ala or Gly,     -   Xaa in position 4 is Phe, His or Tyr,     -   Xaa in position 5 is His, Tyr or Asn, or     -   Xaa in position 10 is Phe or Tyr.

In analyzed wild-type proteins according to the invention, the C-terminal Pro in SEQ ID NO: 47 is followed by Asp Gly, Ala Gly, Asp Asp or Gly Gly.

A mutant or recombinant protein that has at least 80% amino acid sequence identity with a fluorescent protein defined in this invention may be truncated and/or have amino acid substitutions, insertions and/or deletions and have any percentage of amino acid identity with regard to the fluorescent protein defined in this invention between 80% and 99.9%, such as at least 80%, 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, 99.5% or 99.9% identity.

In an embodiment of the invention the fluorescent protein is a protein that has an amino acid sequence selected from the group consisting of SEQ ID NOs: 24-46.

In another embodiment of the invention the fluorescent protein is a protein that has an amino acid sequence selected from the group consisting of SEQ ID NOs: 67-87.

In still another embodiment of the invention the fluorescent protein has an amino acid sequence selected from the group consisting of SEQ ID NOs: 91-93.

The amino acid sequences SEQ ID NOs: 91-93 are examples of mutant proteins of the invention that have the amino acid sequence SEQ ID NO: 79 of the wild-type Branchiostoma lanceolatum and some point mutations. In the amino acid sequences SEQ ID NOs: 91-93, the Thr of the wild-type protein in position 119 has been replaced by Lys, and further in the sequences SEQ ID NOs: 92 and 93, the amino acid Asn of the wild-type protein in position 175 has been replaced by Thr. Additionally, the amino acid Asp of the wild-type protein in position 157 has been replaced by Lys in the sequence SEQ ID NO: 93.

The recombinant fluorescent proteins comprised by the present invention may be in monomeric, dimeric or multimeric, such as tetrameric, form.

Use of Fluorescent Proteins According to the Invention

Due to the inherent property of FPs to spontaneously become fluorescent, and in all organisms and in all types of cells, FPs have become invaluable tools in many biological and medical disciplines. A wide range of applications of the FPs have been developed that can be classified into four areas; visualizing/monitoring of organisms, cells, proteins and molecular events. The fluorescent proteins and structural genes encoding the them according to the invention may be used in all hitherto known applications of fluorescent proteins, such as those described below.

To visualize an organism, a structural gene encoding a FP can be introduced and together with appropriate regulatory sequences become expressed as an inheritable fluorescent marker in a variety of organisms ranging from virus, bacteria and yeast to plants, fish and mice. For example, infections of viruses and bacteria can be monitored, as well as

the survival and spread of genetically modified organisms, CMOs. Since the FPs according to the invention can be functionally expressed in both prokaryotic and eukaryotic cells, and they have excellent stability, brightness and photoresistance, they are expected to be excellent for such applications.

To visualize cells or organelles, a structural gene encoding a FP may be introduced as a transgene in e.g. germ line cells and in vitro cultured cells. For example, monitoring of cell fate/lineages in transgenic animals, of cancer cells in vivo, of wound healing and of neurite outgrowth can be accomplished. Additional examples are marking of organelles (mitochondria, nuclei; etc) and GFP imaging: methodology and application to investigate cellular compartmentation in plants. (See e.g. for a review J Exp Bot. 2001 April; 52(356):529-39.) The properties of the FPs according to the invention enables them to be used for paraffin-imbedded and section tissues.

To visualize proteins, a structural gene encoding a FP can be fused to a gene of interest producing a fusion protein that is tagged by the FP chromophore. The fusion protein can then be monitored in e.g. living cells in real time, thus enabling analyzes of cellular localization of individual proteins (numerous examples in the prior art).

Protein-protein interactions can be followed by labeling two different proteins with two different chromophores, and their interaction can be monitored by FRET (Fluorescence resonance energy transfer) or BRET (Bioluminescence Resonance Energy Transfer) in case of a bioluminescent donor to a fluorescent acceptor protein.

In drug screening protein-drug interactions are studied. Co-localization of fluorescent fusion proteins with intracellular localization markers are used as indicators of movements of intracellular fluorescent fusion proteins/peptides. The aggregation or internalization of fluorescent-tagged plasma membrane proteins (e.g. G-protein coupled receptors) can be used as drug screening assays.

In the literature there are numerous references to the use of FPs and genes encoding them as sensors for different purposes, such as sensors for protease activity:

Detection of MMP activity in living cells by a genetically encoded surface-displayed FRET sensor. Biochim Biophys Acta 2007 March; 1773(3):400-7, Epub 2006 November 11, and Development and application of a GFP—FRET intracellular caspase assay for drug screening. J Biomol Screen. 2000 October; 5(5):307-18; as sensors for atoms or ions: Genetic oxygen sensor: GFP as an indicator of intracellular oxygenation Adv Exp Med. Biol. 2005; 566:39-44, Elimination of environmental sensitivity in a cameleon FRET-based calcium sensor via replacement of the acceptor with Venus. Cell Calcium. 2005 April; 37(4):341-8, Construction of a whole-cell gene reporter for the fluorescent bioassay of nitrate. Anal Biochem. 2004 May 1; 328(1):60-6, and Transgenic mice expressing a pH and Cl-sensing yellow-fluorescent protein under the control of a potassium channel promoter. Eur J. Neurosci. 2002 January; 15(1):40-50; as sensors for organic molecules: A new green fluorescent protein-based bacterial biosensor for analyzing phenanthrene fluxes. Environ Microbiol. 2006 April; 8(4):697-708, and Live imaging of glucose homeostasis in nuclei of COS-7 cells. J. Fluoresc. 2004 September; 14(5):603-9; as sensors for electrical activity or neural cell activation: A hybrid approach to measuring electrical activity in genetically specified neurons”. Nat. Neurosci. 2005 November; 8(11):1619-26. Epub 2005 October, and A genetically encoded optical probe of membrane voltage. Neuron. 1997 October; 19(4):735-41: as sensors for cell cycle: Characterization and gene expression profiling of a stable cell line expressing a cell cycle GFP sensor. Cell Cycle. 2005 January; 4(1):191-5. Epub 2005 January 29; as sensors for promoters or gene activation; A high-throughput approach to promoter study using green fluorescent protein. Biotechnol Prog. 2004 November-December; 20(6):1634-40; and as sensors for apoptosis: Degradation of GFP-labelled POM121, a non-invasive sensor of nuclear apoptosis, precedes clustering of nuclear pores and externalization of phosphatidylserine. Apoptosis. 2004 May; 9(3):363-8.

The invention will now be illustrated by description of drawings and of embodiments and experiments of the invention, but it should be understood that the scope of for the invention is not limited to any described details.

SHORT DESCRIPTION OF THE DRAWINGS

FIG. 1 is a diagram that shows emission spectra after excitation at 500-520 nm of four FPs of the invention, namely Green Y-1=SEQ ID NO: 68; Yellow O-1=SEQ ID NO: 74; Orange O-2=SEQ ID NO: 75 and Red R-5=SEQ ID NO: 79.

FIG. 2 is a diagram that shows the fluorescence plotted against time for the reference EGFP and two FPs of the invention, Green FP Y-1=SEQ ID NO: 68; Red FP R-4=SEQ ID NO: 78, indicating that the FPs are resistant to photobleaching.

FIG. 3 is a diagram that shows the fluorescence plotted against pH for the reference EGFP and a red FP, R-4=SEQ ID NO: 78, of the invention, indicating that R-4 is insensitive to pH.

FIG. 4 is a diagram that shows the fluorescence plotted against temperature for the reference EGFP (squares) and Red FP (triangles), R-1=SEQ ID NO: 76, and Green FP (circles), Y-3=SEQ ID NO: 70, of the invention. The fluorescence was recorded after 1 h (and 1 min for EGFP) after incubation at the indicated temperature.

FIG. 5 is a diagram that shows the fluorescence plotted against concentration of ethanol for the reference EGFP (squares) and Red FP (triangles), R-1=SEQ ID NO: 76, and Green FP (circles), Y-3=SEQ ID NO: 70, of the invention. The fluorescence was recorded after 1 h after incubation at the indicated concentration of ethanol.

FIG. 6 is a diagram that shows absorbance plotted against elution volume in size exclusion chromatography of the wild-type FP wt R-5 (SEQ ID NO: 79), the double mutant Mut B (SEQ ID NO: 92) and the triple mutant Mut C (SEQ ID NO: 93) as well as EGFP.

FIG. 7 is a diagram that shows normalized fluorescence plotted against wavelength in excitation and emission spectra of mutant R-5C, SEQ ID NO: 93. The emission spectra were recorded after excitation at 590 nm.

DESCRIPTION OF EMBODIMENTS AND EXPERIMENTS OF THE INVENTION Natural Occurrence of Fluorescent Proteins of the Invention

Proteins comprised by the definition of fluorescent proteins according to the invention occur naturally in the Cephalochordata (amphioxus) e.g. Branchiostoma floridae or Branchiostoma lanceolatum. They are expressed in supportive cells in the anteriormost of the body, e.g., cells in the coelom walls, the subepidermal canals, the oral cirri skeleton, and the oral cirri tufts. However, the number of different positive cell types varies between different individuals.

The proteins are found as either only a single fluorescent protein, or as a mixture of different fluorescent proteins.

All properties have been examined at room temperature unless otherwise stated.

Characteristics of the Fluorescent Proteins of the Invention

The fluorescence characteristics of selected FPs have been determined by confocal laser scanning microscopy on bacterial colonies with samples in 96-well plates and on proteins purified after expression in E. coli. Brightness was calculated as the product of quantum yield and molar extinction coefficient—determined by comparing the Coomassie Brilliant Blue staining intensity, after SDS-PAGE, of known amounts of EGFP (Enhanced Green Fluorescence Protein) with the new FPs. Purified EGFP has been employed as reference fluorescent protein.

Optical Properties of Fluorescent Proteins of the Invention

The purified proteins are yellow to orange in solution, and yellow to red in solid state.

The absorption maxima for the purified native proteins are at 210 nm (peptide bonds), 260 nm (aromatic amino acid residues), and 484-490 nm (fluorophore). Upon denaturation, the peak at 260 nm is shifted to 280 nm, and the fluorophore peak is almost completely lost.

The emission spectrum of fluorescent cells in situ and of purified proteins consists of peaks at 485, 500, 515, 530, 545, 560, 575, 590, 610 and 635 nm. The number of peaks varies between individuals and between different cell types in the same individual. The absorption maxima for each individual protein might be slightly shifted from these values due to overlaying of neighbouring peaks.

Excitation at 458 nm results in peaks at 485 and 500 nm (primary excitation), and 545-635 nm (presumably fluorescence resonance energy transfer, FRET) Excitation at 476 or 486 nm results in peaks at 515 and 530 nm (primary excitation), and 545-635 nm (presumably FRET). Excitation at 514 nm results in peaks at 545-635 nm (primary excitation).

The excitation spectra of all analyzed new FPs are very similar with maxima around 510-520 nm, in accordance with the identity of the chromophore forming residues, but in spite of this similarity the emission spectra differ considerable as shown in FIG. 1. This property thus suggests a new mechanism for generation of colour diversity as compared to other known fluorescent proteins.

Resistance to Bleaching

The fluorescence has similar bleaching resistance in strong light as the most resistant fluorescent protein (EGFP (Enhanced Green Fluorescence Protein) and Emerald).

The photostability, i.e. resistance to bleaching, was determined on proteins in solutions (Tris-HCl, pH 7.5) that were placed between a cover slip and a microscope slide and exposed to the highest possible light intensity in a fluorescent microscope (mercury lamp, 100× oil immersion). As shown in FIG. 2, the new FPs are very resistant to bleaching and display an even greater photostability than EGFP.

Maturation and Stoke's Shift

As other red fluorescent proteins, the new red variants go through a maturation stage before reaching the mature form that emits red light. In contrast to many other wild-type red FPs the maturation step is quite fast as red fluorescence can be observed in growing E. coli colonies.

Another interesting feature of the red variants is the very large difference between absorption and emission max, the Stoke's shift is more than 100 nm.

Insensitivity to Environmental Effects

The fluorescence is pH dependent. Fluorescence occurs in weakly acidic, neutral and basic solutions but not in acidic and strongly basic solutions. The green fluorescence has a maximum at pH 10, at least 50% fluorescence between pKa1 and pKa2 (limits for >50% of maximal fluorescence; pKa1=7.7, and pKa2=11.6), and has almost a linear dependence of pH between pH 6-9.

The new FPs are extremely stable and insensitive to environmental effects. The pH sensitivity has been analyzed with a red variant (R-4=SEQ ID NO: 78) and as shown in FIG. 3, the fluorescence of R-4 is maintained over a wider pH range than the fluorescence of EGFP.

Physical Properties of Fluorescent Proteins of the Invention

The apparent molecular weight of a native, proteinase K-treated protein is about 25 kDa (gel filtration) and of desaturated protein is about 30-35 kDa (SDS-PAGE). The green fluorescent proteins are slightly larger than the red ones. The protein oligomerizes and predominantly forms tetramers. However, trimers and pentamers, hexamers, dimers and octamers, and polymers are also present (in decreasing order of frequency).

Chemical Properties of Fluorescent Proteins of the Invention

The proteins are soluble in water, phosphate buffer and Tris buffer but not in acetone, ethanol, glycerol or xylene. The proteins can be precipitated with acetone or ethanol but not with ammonium sulfate (80% of saturated solution).

The fluorescence is lost upon denaturation.

The fluorescence resists proteinase K (0.1-1 mg/ml at 45° C. overnight), detergents (10% sodium dodecyl sulfate, and 0.1% triton X-100), aldehyde fixation (formaldehyde, and glutaraldehyde), chelates (1M EDTA), many organic solvents (acetone, ethanol, glycerol, melted paraffin, and xylene), high salinity (80% saturated ammonium sulfate, 4M sodium chloride, and saturated disodium hydrogen phosphate), low salinity (distilled water), heavy metal ions (cupper chloride, lead nitrate, and silver nitrate), weak oxidizing agents (hydrogen peroxide, oxygen in air, potassium chromate in neutral solution, potassium dichromate in neutral solution, potassium ferricyanide, silver nitrate, and sodium chlorate in neutral solution), weak reducing agents (10 mM dithiothreitol, and pyrogallol in neutral solution), and moderately high temperatures (45° C. for 12 h, 65° C. for 1 h).

The fluorescence is destroyed by some organic solvents e.g. (benzyl alcohol-benzyl benzoate mixture, strong oxidizing agents (iodine, periodic acid potassium chromate in acid solution, potassium dichromate in acid solution, potassium permanganate, sodium chlorate in alkaline solution, and sodium hypochlorite in alkaline solution), strong reducing agents (pyrogallol in alkaline solution), and very high temperature (98° C.).

Stability of the Fluorescent Proteins

The stability of the FPs has been analyzed both in situ and using purified proteins after expression in E. coli. As shown in FIGS. 4 and 5, two of the recombinantly expressed proteins of the invention, Red FP(R-1=SEQ ID NO: 76) and Green FP(Y-3=SEQ ID NO: 70), are very thermostable and withstand high concentrations of ethanol. The new FPs are also stable in e.g. 6 M guanidine hydrochloride, and withstand many organic solvents making them useful in histochemical applications including those using paraffin imbedded and sectioned tissues.

Bright Red FPs

The brightness (product of quantum yield and molar extinction coefficient) has been calculated for two of the new red variants, and they are clearly among the brightest red wild-type proteins ever found. The Table below summarizes the fluorescence characteristics of selected members of the FPs of the invention.

TABLE Fluorescence characteristics of selected FPs or the invention and a selection of reference proteins (from Ref. 1). Excitation Emission Photo- Class Protein max (nm) max (nm) Brightness stability Far-red mPlum 590 649 4.1 53 Red mCherry 587 610 16 96 mStrawberry 574 596 26 15 DsRed-mono 556 586 3.5 16 R-4 520 615 20 ND R-5 520 620 20 180 Orange mOrange 548 562 49 9 mKO 548 559 31 122 O-1 530 545 ND ND O-2 530 560 ND ND Yellow Venus 515 528 53 15 green EYFP 514 527 51 60 Y-1 518 530 ND 180 Y-3 518 530 ND ND Green EGFP 489 507 34 174 ND: not determined R-4 = SEQ ID NO: 78; R-5 = SEQ ID NO: 79; O-1 = SEQ ID NO: 74; O-2 = SEQ ID NO: 75; Y-1 = SEQ ID NO: 68; Y-3 = SEQ ID NO: 70. Ref. 1 = Shaner, N. C. et al. A guide to choosing fluorescent proteins. Nature Methods 2 (12) 905-909, 2005. Isolating Fluorescent Proteins from Cephalochordata

Several hundred specimens of lancelet, i.e. Cephalochordata, also named amphioxus, are collected. In this example Branchiostoma lanceolatum were collected. Their heads were cut off and mixed with an equal volume of a neutral buffer solution (e.g., 10 mM Tris, pH 7.5, 10 mM NaCl). The mixture was digested with proteinase K (final concentration of 0.1 mg/ml) at 40° C. over-night followed by centrifugation for 10 min at 16 000 rpm to remove any remaining debris. The supernatant was loaded on a Sephadex G-200 column (30-100 cm; equilibrated with the same buffer as used during the digestion) and gel filtrated. Fluorescent fractions were collected, pooled, and precipitated by addition of 1.8 volumes of acetone and centrifugation for 1 min at 16000 rpm. The supernatant was discharged. The pellet was washed with 65% acetone, briefly dried (allowing remaining acetone to evaporate), dissolved in water, denaturated with sodium dodecyl sulfate (SDS; final concentration of 1%) and dithiothreitol (DTT; final concentration 100 mM) at 95° C. for 3 min, and loaded on a 2.5% SOS-PAGE gel. After the completed gel electrophoresis and Coomassie staining, the two bands at around 30 kDalton were cut out, eluted, and digested into fragments with trypsin. The fragments are analyzed with tandem mass spectrometry using MALDI-TOF (matrix assisted laser desorption-time of flight) to obtain their amino acid sequence. Degenerated oligonucleotide primers are designed from these amino acid sequences. These primers are used for 5′-RACE PCR (rapid amplification of cDNA ends polymerase chain reaction) on cDNA that is prepared from purified mRNA from lancelet heads. The PCR products are size separated by agarose electrophoresis. The different bands are cut out, purified, cloned, and sequenced. Oligonucleotide primers are designed from the obtained sequences and used for 3′RACE PCR on the same cDNA to obtain the complete coding region. The PCR products are cloned into an expression vector, and fluorescent transformants are selected and sequenced. All steps are performed at room-temperature unless otherwise stated.

Isolating Fluorescent Proteins from Genome Project

The known amino acid sequence of green fluorescent proteins (GFP) from copepods (obtained from GenBank) are used for searches through the unassembled trace files obtained from genomic sequencing of Florida lancelet (Branchiostoma floridae) (available at http://www.ensembl.org/). Nucleotide sequences containing putative GFP-like exons were selected. These sequences were elongated by successive searches with the sequences for new matching sequences and alignments of these new sequences to these already found. This is repeated for many cycles until either complete genes were obtained, or no more new sequences are found. The assembled genes are analyzed by a splicing prediction software [NetGene2 software (http://www.cbs.dtu.dk/services/NetGene2/)] and putative exons are converted into amino acid sequences. The obtained nucleotide sequences are analyzed for conserved regions. Degenerated oligonucleotide primers are designed for two different conserved regions (located in exon 3 and exon 6). These primers are used for nested 5′-RACE PCR on cDNA that is prepared from purified mRNA from lancelet heads. The PCR products are size separated by agarose electrophoresis. The different bands are cut out, purified, cloned, and sequenced. Oligonucleotide primers are designed from the obtained sequences and used for 3′RACE PCR on the same cDNA to obtain the complete coding region. The PCR products are cloned into an expression vector, and fluorescent transformants are selected and sequenced.

Cloning and Sequencing

RNA was prepared and through RT-PCR, using degenerate PCR-primers, and 5′- and 3′-RACE, full-length cDNA clones could subsequently be obtained by fluorescence screening of E. coli colonies. The obtained full-length clones represent yellow/green, orange and red FPs. No blue or pure green FPs were among the full-length clones, but a number of incomplete clones were sequenced that may represent these colours.

In total some 40 unique sequences encoding novel fluorescent proteins have been obtained so far. The individual proteins are structurally closely related and although distinct from other known fluorescent proteins they clearly belong to the same superfamily.

Mutant Proteins

a. Aggregation State

The wild-type fluorescent proteins have a tendency to self-aggregate and form tetramers. Among the limited number of new FPs that we so far have analyzed, no monomeric variant has been observed, but the tendency to self-aggregate appears to vary among individual variants. In order to generate a pure monomeric variant, which would be advantageous for certain applications, the protein R-5 (SEQ ID NO: 79) was subjected to limited mutagenesis after extensive 3-D modelling. Three amino acid residues were chosen as suitable candidates for mutagenesis, and three mutants with single, double and triple mutations were constructed and custom made by an external laboratory. After expression in E. coli and purification their aggregation state were analyzed by gel-chromatography on a HiPrep Sephacryl S-200 column. The conditions used (i.e. high protein concentrations) were chosen to promote aggregation, and EGFP, which is a weak forming dimer, did elute as a peak with an apparent molecular weight of ˜60 kDa, i.e. as a dimer. As shown in FIG. 6, the double mutant of R-5 (Mut B), SEQ ID NO: 92, also eluted as a dimer whereas the triple mutant (Mut C), SEQ ID NO: 93, eluted with an apparent molecular weight of ˜30 kDa, i.e. as a monomer.

b. Characteristics of Mutant Fluorescent Proteins

The dimeric mutant (Mut B), SEQ ID NO: 92, has a diminished tendency to self-aggregate, but folds less efficiently than the wild-type protein. The optical properties appear to be the same as the wild-type R-5 (SEQ ID NO: 79) protein. Neither excitation-emission spectra, nor brightness or stability appear different from the wild-type protein. However, the monomeric mutant (Mut C), SEQ ID NO: 93, has lost much of the brightness of the wild-type protein, and—probably due to a maturation defect—has its major emission peak shifted to the orange part of the spectra. Interestingly, the mutants, including the monomeric variant, have also a new emission peak with a maximum at ˜660 nm with excitation at 590 nm. The emission peak is quite wide and although the overall intensity is low, there is a clear emission up to 750 nm, thus extending into the infrared part of the spectra (FIG. 7), which is a clear benefit for in vivo applications. 

1. A fluorescent protein having at least 80% amino acid sequence identity with an amino acid sequence selected from the group consisting of SEQ ID NOs: 34, 40, 41, 46, 70-72, and 74, wherein the fluorescent protein comprises the internal amino acid sequence (SEQ ID NO: 47) Gly Tyr Gly Phe His Gln Tyr Leu Pro Xaa Pro 1               5                   10

and wherein Xaa in position 10 is Phe or Tyr.
 2. The fluorescent protein of claim 1, wherein the fluorescent protein has at least 90% amino acid sequence identity with an amino acid sequence selected from the group consisting of SEQ ID NOs: 34, 40, 41, 46, 70-72, and
 74. 3. The fluorescent protein of claim 1, wherein the fluorescent protein has at least 95% amino acid sequence identity with an amino acid sequence selected from the group consisting of SEQ ID NOs: 34, 40, 41, 46, 70-72, and
 74. 4. The fluorescent protein of claim 1, consisting of SEQ ID NO:
 70. 5. An isolated and purified structural gene encoding the fluorescent protein of claim
 1. 6. The structural gene of claim 5, wherein the fluorescent protein encoded by the structural gene has at least 90% amino acid sequence identity with an amino acid sequence selected from the group consisting of SEQ ID NOs: 34, 40, 41, 46, 70-72, and
 74. 7. The structural gene of claim 6, wherein the fluorescent protein encoded by the structural gene has at least 95% amino acid sequence identity with an amino acid sequence selected from the group consisting of SEQ ID NOs: 34, 40, 41, 46, 70-72, and
 74. 8. A vector comprising the structural gene of claim
 5. 9. A vector comprising the structural gene of claim
 6. 10. A vector comprising the structural gene of claim
 7. 11. A host cell comprising the vector of claim
 8. 12. A host cell comprising the vector of claim
 9. 13. A host cell comprising the vector of claim
 10. 14. A host cell comprising a transgene, wherein the transgene comprises the structural gene of claim
 5. 15. A host cell comprising a transgene, wherein the transgene comprises the structural gene of claim
 6. 16. A host cell comprising a transgene, wherein the transgene comprises the structural gene of claim
 7. 